March 27, 2020
Unfortunately there is currently never a day where the coronavirus is not on the news. The coronavirus has been putting the lives and health of people around the world at risk, also causing secondary disasters such as economic decline.
Recently there have been news regarding another type of secondary disaster: coronavirus scams. These malicious crimes are targeted toward vulnerable people by using the coronavirus as a way of threatening people, and this is just agonizing.
(See an example of the news here: https://www.newsweek.com/coronavirus-covid19-police-warning-scam-fake-cdc-experts-testing-1493459?fbclid=IwAR2jO2yaRVk-P2l9gQpbqTHkf_c8EO0wHn4FZVRjbArvGW1U5YKRiWDeOqU)
These depressing stories are why we decided to conduct data analysis to uncover the relationship between coronavirus and crimes.
Specifically, we aim to discover a relationship between the coronavirus and crimes in New York City (NYC). Here, we focus on 2 types of crimes: burglary and robbery.
Definitions:
Burglary - entry into a building illegally with intent to commit a crime, especially theft.
Robbery - the action of taking property unlawfully from a person or place by force or threat of force.
(from Google Search's dictionary)
We have 4 reasons why we chose these 2 crimes:
The analysis we conduct are the following:
(We briefly explain the reasoning behind why we chose these analysis later in the beginning of each section)
Next, we preprocess the crime data before the analysis.
The data in SpotCrime was in the form of tables, which we copy-pasted onto an Excel workbook.
(Although it was in the form of tables in SpotCrime, after the copy-paste the format changed so preprocessing became needed.)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Excel File can be found on Github
# example url:
# https://spotcrime.com/ny/new+york?fbclid=IwAR2FShJaYSxrtwilE4tXlZINExpSfSJcZWA5TnCWpOaiLQL6iA0hXAsmSTM#crime-info
burg_2018_df = pd.read_excel('data/NYCrimes.xlsx', header=None, sheet_name='Mar2018Burg')
rob_2018_df = pd.read_excel('data/NYCrimes.xlsx', header=None, sheet_name='Mar2018Rob')
burg_2019_df = pd.read_excel('data/NYCrimes.xlsx', header=None, sheet_name='Mar2019Burg')
rob_2019_df = pd.read_excel('data/NYCrimes.xlsx', header=None, sheet_name='Mar2019Rob')
burg_2020_df = pd.read_excel('data/NYCrimes.xlsx', header=None, sheet_name='Mar2020Burg')
rob_2020_df = pd.read_excel('data/NYCrimes.xlsx', header=None, sheet_name='Mar2020Rob')
# Data collected from https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary.pdf
corona_cases_df = pd.read_excel('data/CoronaNYCCases.xlsx', sheet_name='CoronaNYCCases')
# Function to organize the data into a clean form with the columns ['crime_type', 'date', 'address']
def organize_df(df):
crime_type = df.iloc[0::3]
crime_type.reset_index(drop=True, inplace=True)
date = df.iloc[1::3]
date.reset_index(drop=True, inplace=True)
address = df.iloc[2::3]
address.reset_index(drop=True, inplace=True)
combined = pd.concat([crime_type, date, address], axis=1)
combined.columns = ['crime_type', 'date', 'address']
return combined
# Organize each data of each sheet, and concatenate all of them
burg_2018_df = organize_df(burg_2018_df)
rob_2018_df = organize_df(rob_2018_df)
burg_2019_df = organize_df(burg_2019_df)
rob_2019_df = organize_df(rob_2019_df)
burg_2020_df = organize_df(burg_2020_df)
rob_2020_df = organize_df(rob_2020_df)
full_df = pd.concat([burg_2018_df, rob_2018_df, burg_2019_df, rob_2019_df, burg_2020_df, rob_2020_df], axis=0)
full_df.reset_index(inplace=True, drop=True)
# Output full concatenated dataframe as csv file
full_df.to_csv('data/nycrimes_mar_2018_to_2020.csv')
# Read full burglary and robbery csv file
df = pd.read_csv('data/nycrimes_mar_2018_to_2020.csv', index_col=0, parse_dates=['date'])
df.head()
The goal of this analysis is to see whether there is an abnormal increase/decrease in the total number of crimes that occurred during March 1st to 22nd, 2020, compared to previous years. If there is an obvious increase/decrease, then we may be able to relate that to the coronavirus crisis.
burglary_df = df[df['crime_type']=='Burglary']
robbery_df = df[df['crime_type']=='Robbery']
years_list = np.unique(df.date.dt.year)
burglary_count_l = []
robbery_count_l = []
len(df[df.date.dt.year==2018])
for year in years_list:
burglary_count_l.append(len(burglary_df[burglary_df.date.dt.year==year]))
robbery_count_l.append(len(robbery_df[robbery_df.date.dt.year==year]))
plt.bar(years_list, burglary_count_l)
plt.xticks(np.arange(years_list[0], years_list[-1]+1), labels=years_list)
plt.title('Total Number of Burglary Cases in NYC from 3/1~3/22 for years 2018~2020')
plt.xlabel('Year')
plt.ylabel('# of Burglary Cases')
plt.show()
plt.bar(years_list, robbery_count_l)
plt.xticks(np.arange(years_list[0], years_list[-1]+1), labels=years_list)
plt.title('Total Number of Robbery Cases in NYC from 3/1~3/22 for years 2018~2020')
plt.xlabel('Year')
plt.ylabel('# of Robbery Cases')
plt.show()
From the two bar plots, we cannot confidently say that there was a increase/decrease in burglary or robbery cases during March 1~22 for 2020, compared to previous years. This means that despite the trend of more people staying at home, we cannot say that the total number of burglary and robbery cases have been decreasing.
Now let's look at the specific number of daily cases during March 1st to 22nd, 2020.
By looking at the daily number of burglary and robbery cases during the coronavirus crisis, we may be able to discover a relationship between burglary, robbery, and the coronavirus.
Below we plot the daily number of burglary cases from March 1st to March 22nd, 2020.
burglary_2020_df = burglary_df[burglary_df.date.dt.year==2020]
robbery_2020_df = robbery_df[robbery_df.date.dt.year==2020]
day_list = np.arange(1,23)
burglary_daily_count_l = []
robbery_daily_count_l = []
for day in day_list:
burglary_daily_count_l.append(len(burglary_2020_df[burglary_2020_df.date.dt.day==day]))
robbery_daily_count_l.append(len(robbery_2020_df[robbery_2020_df.date.dt.day==day]))
plt.plot(day_list, burglary_daily_count_l)
plt.xticks(np.arange(day_list[0], day_list[-1]+1), labels=day_list)
plt.title('Number of Burglary Cases in NYC from 3/1~3/22/2020')
plt.xlabel('Day')
plt.ylabel('# of Burglary Cases')
plt.show()
There seems to be a decrease in the number of burglary cases as days are progressing. Intuitively, this is because it is more likely that there are people inside buildings (homes) due to the coronavirus, which prevent burglars from entering the building to commit crimes.
We plot the daily number of positive coronavirus cases below, and calculate the Pearson correlation.
plt.plot(day_list, corona_cases_df.Cases)
plt.xticks(np.arange(day_list[0], day_list[-1]+1), labels=day_list)
plt.title('Number of Daily Positive Coronavirus Cases in NYC from 3/1~3/22/2020')
plt.xlabel('Day')
plt.ylabel('# of Positive Coronavirus Cases')
plt.show()
np.corrcoef(burglary_daily_count_l, corona_cases_df.Cases)
We can find a correlation of -0.6 between the daily number of burglary cases and positive coronavirus cases, which is a fairly strong negative correlation. Conducting a significance test for the correlation, the p-value was 0.003158, which is significant at significance level 0.01.
In addition, when we calculated the correlation between the daily number of burglary cases in March 1st to 22nd of the years 2018 and 2019 with the number of positive coronavirus cases, they became -0.187 and -0.090 respectively. This shows that there is usually no decreasing trend within the month of March, which supports the hypothesis that the increase in positive coronavirus cases is related to the daily number of burglary cases.
Also, note that we have not heard of any security enforcements during this time, so it is likely that the coronavirus has been affecting the number of burglary cases.
Next, we plot the daily number of robbery cases from March 1st to March 22nd, 2020.
plt.plot(day_list, robbery_daily_count_l)
plt.xticks(np.arange(day_list[0], day_list[-1]+1), labels=day_list)
plt.title('Number of Robbery Cases in NYC from 3/1~3/22/2020')
plt.xlabel('Day')
plt.ylabel('# of Robbery Cases')
plt.show()
The number of robbery cases do not seem to change regardless of the increasing trend of people staying at home. Since robbery is theft by force or threat of force, it can be hypothesized that criminals will commit robbery regardless of whether a person is there or not. In otherwords, we may be able to say that robbery criminals are active as usual even during a pandemic.
We also calculate the correlation against the number of positive coronavirus cases.
np.corrcoef(robbery_daily_count_l, corona_cases_df.Cases)
Just like our understanding from the plot, there is no correlation (=0.09) between the daily number of robbery cases and positive coronavirus cases.
Next, we try to discover the different areas where burglary and robbery is likely to occur, given the current situation of the coronavirus.
This analysis focuses on the locations of burglary and robbery cases using heatmap visualization through Carto. We want to examine if there are any abnormal concentration of cases in NYC, and this understanding may help with us with avoiding and preventing crime.
from IPython.display import Image
Image(filename="img/2018.jpg",width=400,height=400)
From the crime heatmap in year 2018, crimes concentrated on areas of HARLEM (near The Studio Museum in Harlem), HUDSON HEIGHTS (near Highbridge Park), St.Mary's Park area and MAPLETON (near Washington Cemetery park). In short, crimes in 2018 mostly happened in the Harlem area and downtown Brooklyn area with a concentration around restaurants, landmarks and parks.
Image(filename="img/2019.jpg",width=400,height=400)
In March 2019, the crime concentrations made a shift from Upper Manhattan area to West Bronx area and also areas of Morris Heights, MT Hope, and MT EDEN compared to March 2018. New crime concentrations are located near Sugar Hill area, East village, and SOHO area.
Image(filename="img/2020.jpg",width=400,height=400)
In March 2020, the crimes tend to be much more spread out compared to previous years. This could be because more people (criminals are included) staying at their own homes instead of gathering in one location (workplace, restaurant, landmark). Moreover, crimes could happen more frequently at local stores due to the high demand of necessity goods. New crime concentrations are the Bronx Zoo, Queens Botanical Garden, South Jamaica, Midtown Manhattan area, and Greenwich village area: where many stores and residential communities are located.
Now let's focus only on burglary cases during March 1-22, 2018-2020.
Image(filename="img/2018_burglary.jpg",width=400,height=400)
Image(filename="img/2019_burglary.jpg",width=400,height=400)
Image(filename="img/2020_burglary.jpg",width=400,height=400)
We can observe that the burglary crime concentrations in year 2020 are much more spread out comparing to previous years. The concentrations also have a shifted focus on Forest Park and Marbill hill area where some local markets located at.
Now let's focus only on robbery cases during March 1-22, 2018-2020.
Image(filename="img/2018_robbery.jpg",width=400,height=400)
Image(filename="img/2019_robbery.jpg",width=400,height=400)
Image(filename="img/2020_robbery.jpg",width=400,height=400)
Robbery cases in year 2020 also presented a more scattered distribution comparing with previous years. During the coronavirus outbreak, new robbery concentrations are near midtown Manhattan area and Lenox area where more people used to gather before the outbreak.
In general though, burglary and robbery cases are distributed in roughly the same areas regardless of the coronavirus.
In general, during the period March 1st to March 22nd, we cannot say with 100% confidence that crime rate increased or decreased compared to previous years because of the coronavirus, but we had some interesting and likely findings:
We hope that these results will help in any way for preventing and avoiding crimes during this pandemic.
Thank you.
By Luis Lu and Yuki Nishimura